[pull] master from ggml-org:master by pull[bot] · Pull Request #69 · CrazyForks/llama.cpp

pull · 2026-05-16T15:42:25Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…u trigger and items (#22736) * fix tab order on attach button, and dont focus on disabled mennu item * add a11y tests

* Fix handling of MCP resource template parameters * Fix formatting for uri-template.test.ts --------- Co-authored-by: kuba <kuba@laptop.local.net>

* spec: support MTP * fix batch size * rename files * cont : simplify (#7) * MTP: clean-up (#9) * MTP: clean-up * review: use llama_context_type instead of llama_graph_type * review: remove llama_model_has_mtp * review: fix convert issues * convert: fix pycheck * review: formatting * use `mtp-` for identifying mtp models * convert: fix mtp conversion * mtp -> draft-mtp * remove unused llama_arch * add need_embd in speculative * llama: allow partial seq_rm for GDN models for speculative decoding Currently speculative checkpoint needs to restart from a checkpoint after some draft tokens are not accepted, this leads to some wastage in running the target again. This PR adds the ability to rollback upto `draft_max` by storing the GDN intermediates. * fix pending state * vulkan: add GDN partial rollback * meta: extend check to axis 1 * metal: add GDN partial rollback Extend the gated delta net kernel to store intermediate states for partial rollback support on the Metal backend. - Add K (snapshot slot count) as a function constant - Read input state from slot 0 of the 3D state tensor - Write intermediate states to different slots during token loop - For K=1, maintain backward-compatible single-slot behavior Ref: 8c05923 Assisted-by: llama.cpp:local pi * delta_net_base: use ggml_pad instead of new_tensor * review: add need_rs_seq * review: rename part_bounded to n_rs * review: deslop comments * review: rename, add asserts * server : adjust checkpoint logic (#11) * server : adjust checkpoint logic * cont : rm asserts * server-context: fix early exit * spec : fix compatibility with n-gram and add TODOs (#13) * metal : cleanup * llama : fix faulty bitwise check in recurrent memory * server : disable RS-based MTP in combination with other spec types * spec : add TODOs * cont : fix comment * cont : update comment * common : fix logic for ngram + mtp compat * llama-memory: enable checkpointing with partial rollback * cont: add test-case for loading into a dirty ctx * llama-memory-recurrent: clear rs_idx in clear * download: fix mtp path * llama-arch: fix enorm op * docs: update docs * conversion: fix type annotations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

In `tools/ui/README.md`, update the relative links, now that the `README.md` file has been moved from `tools/server/webui/` to `tools/ui/`. See 59778f0.

That's always how it's done: https://github.com/search?q=path%3ACMakeLists.txt%20%22%24%7BCMAKE_INSTALL_LIBDIR%7D%2Fpkgconfig%22&type=code

…/1477) For a given output position j on the time axis, only input positions i such that i*s0 <= j < i*s0 + K contribute -- i.e. i in [ceil((j - K + 1)/s0), floor(j/s0)] intersected with [0, IL-1]. That's at most ceil(K/s0) values (typically 2 for stride==K/2 transposed convs). The current kernel iterates the full IL range and filters with an `if`, amplifying per-thread work by IL/ceil(K/s0) (~160x for IL=320, K=10, s0=5 -- a representative codec-decoder shape). On Apple M1 the wasted work trips the macOS GPU watchdog (kIOGPUCommandBufferCallbackErrorImpactingInteractivity) on long graphs. Compute i_min, i_max analytically before the inner loop and iterate only [i_min, i_max]. Output is bit-identical (same multiplies and adds in the same order); loop bound shrinks by IL/ceil(K/s0). Tested on M1 with a downstream consumer running a TTS codec at full T_codec; end-to-end codec decode ~3-4x faster, zero watchdog hits across long synthesis runs vs ~30% pre-patch.

* feat: Add request timeout for MCP tool calls in llama-ui * feat: MCP Settings tab with max timeout setting

vignesh191 and others added 10 commits May 16, 2026 12:00

webui : [ChatFormActionAdd][a11y] fix accessibility issues in add men…

1428004

…u trigger and items (#22736) * fix tab order on attach button, and dont focus on disabled mennu item * add a11y tests

ui: Fix handling of MCP resource template parameters (#23117)

b81c2cd

* Fix handling of MCP resource template parameters * Fix formatting for uri-template.test.ts --------- Co-authored-by: kuba <kuba@laptop.local.net>

vendor : update cpp-httplib to 0.45.0 (#23103)

18675b6

ui: Correct links in tools/ui/README.md [no ci] (#23139)

25b1bc9

In `tools/ui/README.md`, update the relative links, now that the `README.md` file has been moved from `tools/server/webui/` to `tools/ui/`. See 59778f0.

ggml: install ggml.pc in <libdir>/pkgconfig (ggml/1480)

2eb3e6b

That's always how it's done: https://github.com/search?q=path%3ACMakeLists.txt%20%22%24%7BCMAKE_INSTALL_LIBDIR%7D%2Fpkgconfig%22&type=code

ggml : bump version to 0.12.0 (ggml/1494)

e6c37a1

sync : ggml

3a92bc9

ui: Add request timeout for MCP tool calls (#23138)

0253fb2

* feat: Add request timeout for MCP tool calls in llama-ui * feat: MCP Settings tab with max timeout setting

pull Bot locked and limited conversation to collaborators May 16, 2026

pull Bot added the ⤵️ pull label May 16, 2026

pull Bot merged commit 0253fb2 into CrazyForks:master May 16, 2026
11 of 21 checks passed

github-actions Bot added Nvidia GPU testing examples python server ggml Apple Metal Vulkan script server/ui model labels May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#69

[pull] master from ggml-org:master#69
pull[bot] merged 10 commits into
CrazyForks:masterfrom
ggml-org:master

pull Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

pull Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

pull Bot commented May 16, 2026 •

edited

Loading